An Introduction to py-Goldsberry

py-Goldsberry is a Python package that makes it easy to interface with the http://stats.nba.com and retrieve the data in a more analyzable format.

This is the first in a series of tutorials that walk through the different modules of the packages and how to use each to get different types of data.

If you've made it this far, you're probably less interested in reading about the package and more interested in actually using it.

Installation

If you don't have the package installed, use pip install get the latest version

pip install py-goldsberry
pip install --upgrade py-goldsberry

When you have py-goldsberry installed, you can load the package and check the version number


In [1]:
import goldsberry
import pandas as pd
goldsberry.__version__


Out[1]:
'0.6.0'

py-goldsberry is designed to work in conjuntion with Pandas. Each function within the package returns data in a format that is easily converted to a Pandas DataFrame.

To get started, let's get a list of all of the players who were on an NBA roster during the 2014-15 season

PlayerIDs


In [2]:
players2014 = goldsberry.PlayerList(2014)
players2014 = pd.DataFrame(players2014)
players2014.head()


Out[2]:
DISPLAY_LAST_COMMA_FIRST FROM_YEAR PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Acy, Quincy 2012 203112 quincy_acy 1 NYK New York knicks 1610612752 Knicks 2015
1 Adams, Jordan 2014 203919 jordan_adams 1 MEM Memphis grizzlies 1610612763 Grizzlies 2015
2 Adams, Steven 2013 203500 steven_adams 1 OKC Oklahoma City thunder 1610612760 Thunder 2015
3 Adrien, Jeff 2010 202399 jeff_adrien 0 0 2014
4 Afflalo, Arron 2007 201167 arron_afflalo 1 POR Portland blazers 1610612757 Trail Blazers 2015

If you want to get players who were on an NBA roster during the 1990-91 season, you can pass 1990 to goldsberry.PlayerList()


In [3]:
players1990 = goldsberry.PlayerList(1990)
players1990 = pd.DataFrame(players1990)
players1990.head()


Out[3]:
DISPLAY_LAST_COMMA_FIRST FROM_YEAR PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Alarie, Mark 1986 76019 HISTADD_mark_alarie 1 WAS Washington wizards 1610612764 Bullets 1990
1 Alford, Steve 1987 76024 HISTADD_steve_alford 1 DAL Dallas mavericks 1610612742 Mavericks 1990
2 Ball, Cedric 1990 76090 HISTADD_cedric_ball 1 LAC Los Angeles clippers 1610612746 Clippers 1990
3 Bannister, Ken 1984 76094 HISTADD_ken_bannister 1 LAC Los Angeles clippers 1610612746 Clippers 1990
4 Butler, Greg 1988 76320 HISTADD_gregory_butler 1 LAC Los Angeles clippers 1610612746 Clippers 1990

You can pass any year to the PlayerList() function to get the roster of players from that season. Alternatively, you may want a list of any player that has been on an NBA roster at any point in the history of the league. You can retrieve this list by passing alltime=True to the PlayerList() function.


In [4]:
players_alltime = goldsberry.PlayerList(AllTime=True)
players_alltime = pd.DataFrame(players_alltime)
players_alltime.sample(10)


Out[4]:
DISPLAY_LAST_COMMA_FIRST FROM_YEAR PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
1259 Gibson, Dee 1949 76808 HISTADD_gibby_gibson 0 0 1949
2516 Molinas, Jack 1953 77624 HISTADD_jack_molinas 0 0 1953
3558 Thabeet, Hasheem 2009 201934 hasheem_thabeet 0 0 2013
3866 White, Herb 1970 78508 HISTADD_herb_white 0 0 1970
1534 Hayes, Steve 1981 76980 HISTADD_steve_hayes 0 0 1985
3723 Vetra, Gundars 1992 78415 HISTADD_gundars_vetra 0 0 1992
579 Carter, Jake 1949 76359 HISTADD_jake_carter 0 0 1949
2667 Noel, David 2006 200786 david_noel 0 0 2006
491 Burke, Trey 2013 203504 trey_burke 1 UTA Utah jazz 1610612762 Jazz 2015
863 Delfino, Carlos 2004 2568 carlos_delfino 0 0 2013

I just sampled 10 random players from the alltime list to illustrate that there are a combination of historic and current NBA players.

The PlayerList() function is critical to the usage of other parts of the package. If you are interested in player level data, I highly recommend creating a list of players that you are interested in by using this function. You can refer to this list later.

GameIDs

One of the major modules of py-goldsberry is the game module. Within that module lies a set of classes that extracts information at a game level. There are two key sub-types of data in the module, box score and play-by-play. To access this data, you will need a specific GameID.

These GameIDs are not super straightforward to find through the stats.nba.com website.py-goldsberry has a function built in that links to a table I have created containing all of the GameIDs from the first game in NBA history through the end of the 2014-15 season.

To access this table of GameIDs, use the GameIDs() function.


In [5]:
gameids = goldsberry.GameIDs()
gameids = pd.DataFrame(gameids)
gameids.sample(10)


Out[5]:
GAMECODE GAME_DATE_EST GAME_ID HOME_TEAM_ID SEASON VISITOR_TEAM_ID
22619 19850412/CLENYK 1985-04-12T00:00:00 0028400929 1610612752 1984 1610612739
52372 20090221/OKCGSW 2009-02-21T00:00:00 0020800826 1610612744 2008 1610612760
2998 19541030/FTWMIH 1954-10-30T00:00:00 0025400003 1610612737 1954 1610612765
60300 20150102/DALBOS 2015-01-02T00:00:00 0021400486 1610612738 2014 1610612742
27344 19900119/LALMIL 1990-01-19T00:00:00 0028900497 1610612749 1989 1610612747
22587 19850406/PHLIND 1985-04-06T00:00:00 0028400894 1610612754 1984 1610612755
1977 19511127/MIHROC 1951-11-27T00:00:00 0025100066 1610612758 1951 1610612737
54801 20101217/PHXDAL 2010-12-17T00:00:00 0021000390 1610612742 2010 1610612756
48135 20060307/HOUMIN 2006-03-07T00:00:00 0020500893 1610612750 2005 1610612745
44109 20030406/UTASEA 2003-04-06T00:00:00 0020201112 1610612760 2002 1610612762

This table is fairly raw at this point. I'm still in the process of augmenting and making the data more easily searchable. For now, it may make sense to filter by a specific season or date. In the GAMECODE column, the code breaks down into the date followed by the initials of the two teams involved.

As with PlayerIDs, this table will likely be used fairly often. It is best to pull the list of games into an object at the very beginning of the analysis for easy access to filter

TeamIDs

A third module, team requires the use of unique teamIDs. I'm still in the process of building a simple way to arrive at a searchable table, but you can get at a list of ids (not matched to team name) by filtering the gameids table we just created.


In [6]:
filter_season = '2014'
teamids = gameids['HOME_TEAM_ID'].ix[gameids['SEASON']==filter_season].drop_duplicates()
teamids.head()


Out[6]:
59697    1610612748
59698    1610612739
59699    1610612761
59700    1610612738
59701    1610612737
Name: HOME_TEAM_ID, dtype: int64

You will need to make sure you pass the year you wish to filter by as a string or you will need to change the datatype of the season column to numeric before you filter.

While this list is comprehensive in terms of unique teamIDs for the 2014-15 season, it is not matched with the team name. It is not as useful as it could be without additional information. We can use one of the classes within the team module to get some additional information, and with a few lines of code, have a more descriptive database of teamIDs

We'll start by getting information for a single team. Then we'll put together a loop that creates a searchable/sortable dataframe.


In [7]:
teaminfo = goldsberry.team.team_info(teamids.iloc[0])

In [8]:
pd.DataFrame(teaminfo.info())


Out[8]:
CONF_RANK DIV_RANK L MAX_YEAR MIN_YEAR PCT SEASON_YEAR TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_CONFERENCE TEAM_DIVISION TEAM_ID TEAM_NAME W
0 10 3 45 2015 1988 0.451 2014-15 MIA Miami heat East Southeast 1610612748 Heat 37

You can see above, calling the team_info() class within the team module returns an object which we saved as teaminfo. To get the actual data, we call the info() method which is part of the teaminfo object that we created. This is the standard parttern for the almost all of py-goldsberry. The package is built this way to minimize the nubmer of calls that need to be made the NBA servers while returning a maximum amount of data.

In general, all calls are classes. Each class has methods associated with the variety of data that is retrieved when a unique call is made to the NBA website. When you save each class as an object, you immediately make a call to the website and the data which is retrieved is stored within the object and accessible through the use of object specific methods. If that doesn't make sense, don't worry. Just keep following the tutorials and you'll get the hang of how to use it without necessarily needing to understand the underlying mechanics.

After a brief digression, back to creating a table of teamIDs with rich information. We can create a nice table by implementing a simple loop gathering information on each team and merginging it into a single dataframe.


In [9]:
teamids_full = pd.DataFrame() # Create empty Data Frame
for i in teamids.values:
    team = goldsberry.team.team_info(i)
    teamids_full = pd.concat([teamids_full, pd.DataFrame(team.info())])

In [10]:
teamids_full


Out[10]:
CONF_RANK DIV_RANK L MAX_YEAR MIN_YEAR PCT SEASON_YEAR TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_CONFERENCE TEAM_DIVISION TEAM_ID TEAM_NAME W
0 10 3 45 2015 1988 0.451 2014-15 MIA Miami heat East Southeast 1610612748 Heat 37
0 2 1 29 2015 1970 0.646 2014-15 CLE Cleveland cavaliers East Central 1610612739 Cavaliers 53
0 4 1 33 2015 1995 0.598 2014-15 TOR Toronto raptors East Atlantic 1610612761 Raptors 49
0 7 2 42 2015 1946 0.488 2014-15 BOS Boston celtics East Atlantic 1610612738 Celtics 40
0 1 1 22 2015 1949 0.732 2014-15 ATL Atlanta hawks East Southeast 1610612737 Hawks 60
0 3 2 32 2015 1966 0.610 2014-15 CHI Chicago bulls East Central 1610612741 Bulls 50
0 14 5 61 2015 1948 0.256 2014-15 LAL Los Angeles lakers West Pacific 1610612747 Lakers 21
0 9 4 44 2015 1976 0.463 2014-15 IND Indiana pacers East Central 1610612754 Pacers 38
0 8 3 44 2015 1976 0.463 2014-15 BKN Brooklyn nets East Atlantic 1610612751 Nets 38
0 12 5 50 2015 1948 0.390 2014-15 DET Detroit pistons East Central 1610612765 Pistons 32
0 7 4 32 2015 1980 0.610 2014-15 DAL Dallas mavericks West Southwest 1610612742 Mavericks 50
0 11 3 44 2015 1974 0.463 2014-15 UTA Utah jazz West Northwest 1610612762 Jazz 38
0 13 4 53 2015 1948 0.354 2014-15 SAC Sacramento kings West Pacific 1610612758 Kings 29
0 3 2 26 2015 1970 0.683 2014-15 LAC Los Angeles clippers West Pacific 1610612746 Clippers 56
0 14 4 64 2015 1949 0.220 2014-15 PHI Philadelphia sixers East Atlantic 1610612755 76ers 18
0 8 5 37 2015 2002 0.549 2014-15 NOP New Orleans pelicans West Southwest 1610612740 Pelicans 45
0 6 3 41 2015 1968 0.500 2014-15 MIL Milwaukee bucks East Central 1610612749 Bucks 41
0 12 4 52 2015 1976 0.366 2014-15 DEN Denver nuggets West Northwest 1610612743 Nuggets 30
0 10 3 43 2015 1968 0.476 2014-15 PHX Phoenix suns West Pacific 1610612756 Suns 39
0 2 1 26 2015 1967 0.683 2014-15 HOU Houston rockets West Southwest 1610612745 Rockets 56
0 4 1 31 2015 1970 0.622 2014-15 POR Portland blazers West Northwest 1610612757 Trail Blazers 51
0 11 4 49 2015 1988 0.402 2014-15 CHA Charlotte hornets East Southeast 1610612766 Hornets 33
0 15 5 66 2015 1989 0.195 2014-15 MIN Minnesota timberwolves West Northwest 1610612750 Timberwolves 16
0 5 2 27 2015 1995 0.671 2014-15 MEM Memphis grizzlies West Southwest 1610612763 Grizzlies 55
0 5 2 36 2015 1961 0.561 2014-15 WAS Washington wizards East Southeast 1610612764 Wizards 46
0 15 5 65 2015 1946 0.207 2014-15 NYK New York knicks East Atlantic 1610612752 Knicks 17
0 9 2 37 2015 1967 0.549 2014-15 OKC Oklahoma City thunder West Northwest 1610612760 Thunder 45
0 13 5 57 2015 1989 0.305 2014-15 ORL Orlando magic East Southeast 1610612753 Magic 25
0 1 1 15 2015 1946 0.817 2014-15 GSW Golden State warriors West Pacific 1610612744 Warriors 67
0 6 3 27 2015 1976 0.671 2014-15 SAS San Antonio spurs West Southwest 1610612759 Spurs 55

Now you have three tables of highly valuable information for utilzing the rest of the package: players_alltime, gameids, and teamids_full.

If you feel comfortable with what we have so far, go forth and collect data! If you want a bit more help, check out some of the other tutorials I've put together.